Goto

Collaborating Authors

 execution skill


Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills

Xie, Yuquan, Li, Zaijing, Shao, Rui, Chen, Gongwei, Zhou, Kaiwen, Li, Yinchuan, Jiang, Dongmei, Nie, Liqiang

arXiv.org Artificial Intelligence

Recent efforts to leverage the Multi-modal Large Language Model (MLLM) as GUI agents have yielded promising outcomes. However, these agents still struggle with long-horizon tasks in online environments, primarily due to insufficient knowledge and the inherent gap between offline and online domains. In this paper, inspired by how humans generalize knowledge in open-ended environments, we propose a Hierarchical Multimodal Skills (HMS) module to tackle the issue of insufficient knowledge. It progressively abstracts trajectories into execution skills, core skills, and ultimately meta-skills, providing a hierarchical knowledge structure for long-horizon task planning. To bridge the domain gap, we propose the Skill-Augmented Monte Carlo Tree Search (SA-MCTS) algorithm, which efficiently leverages skills acquired in offline environments to reduce the action search space during online tree exploration. Building on HMS, we propose Mirage-1, a multimodal, cross-platform, plug-and-play GUI agent. To validate the performance of Mirage-1 in real-world long-horizon scenarios, we constructed a new benchmark, AndroidLH. Experimental results show that Mirage-1 outperforms previous agents by 32\%, 19\%, 15\%, and 79\% on AndroidWorld, MobileMiniWob++, Mind2Web-Live, and AndroidLH, respectively. Project page: https://cybertronagent.github.io/Mirage-1.github.io/


Approximate Estimation of High-dimension Execution Skill for Dynamic Agents in Continuous Domains

Nieves-Rivera, Delma, Archibald, Christopher

arXiv.org Artificial Intelligence

In many real-world continuous action domains, human agents must decide which actions to attempt and then execute those actions to the best of their ability. However, humans cannot execute actions without error. Human performance in these domains can potentially be improved by the use of AI to aid in decision-making. One requirement for an AI to correctly reason about what actions a human agent should attempt is a correct model of that human's execution error, or skill. Recent work has demonstrated successful techniques for estimating this execution error with various types of agents across different domains. However, this previous work made several assumptions that limit the application of these ideas to real-world settings. First, previous work assumed that the error distributions were symmetric normal, which meant that only a single parameter had to be estimated. In reality, agent error distributions might exhibit arbitrary shapes and should be modeled more flexibly. Second, it was assumed that the execution error of the agent remained constant across all observations. Especially for human agents, execution error changes over time, and this must be taken into account to obtain effective estimates. To overcome both of these shortcomings, we propose a novel particle-filter-based estimator for this problem. After describing the details of this approximate estimator, we experimentally explore various design decisions and compare performance with previous skill estimators in a variety of settings to showcase the improvements. The outcome is an estimator capable of generating more realistic, time-varying execution skill estimates of agents, which can then be used to assist agents in making better decisions and improve their overall performance.


Hustling in Repeated Zero-Sum Games with Imperfect Execution

Archibald, Christopher (Stanford University) | Shoham, Yoav (Stanford University)

AAAI Conferences

We study repeated games in which players have imperfect execution skill and one player's true skill is not common knowledge. In these settings the possibility arises of a player "hustling", or pretending to have lower execution skill than they actually have. Focusing on repeated zero-sum games, we provide a hustle-proof strategy; this strategy maximizes a player's payoff, regardless of the true skill level of the other player.